Analysis for REPERA: A Hybrid Data Protection Mechanism in Distributed Environment

نویسندگان

  • Longbin Lai
  • Linfeng Shen
  • Yanfei Zheng
  • Kefei Chen
  • Jing Zhang
چکیده

Distributed systems, especially those providing cloud services, endeavor to construct sufficiently reliable storage in order to attract more customers. Generally, pure replication and erasure code are widely adopted in distributed systems to guarantee reliable data storage, yet both of them contain some deficiencies. Pure replication consumes too much extra storage and bandwidth, while erasure code seems not so high-efficiency and only suitable for read-only context. The authors proposed REPERA as a hybrid mechanism combining pure replication and erasure code to leverage their advantages and mitigate their shortages. This paper qualitatively compares fault-resilient distributed architectures built with pure replication, erasure code and REPERA. The authors show that systems employing REPERA share with erasure-resilient systems a higher availability and more durable storage with similar space and bandwidth consumption when compared with replicated systems. The authors show that systems employing REPERA, on one hand, obtain higher availability while comparing to erasure-resilient systems, on the other hand, benefit from more durable storage while comparing to replicated systems. Furthermore, since REPERA was developed under the open platform, REPERA, the authors prepare an experiment to evaluate the performance of REPERA by comparing with the original system. DOI: 10.4018/ijcac.2012010105 72 International Journal of Cloud Applications and Computing, 2(1), 71-82, January-March 2012 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. information. However, cheap components follow by potential hardware failures, while public exposure leads to the facilities of hack attacks and viruses. Furthermore more, disk drives are statistically the most commonly replaced hardware components in large storage systems, accounting for almost 30% to 50% of all hardware replacements (Schroeder & Gibson, 2007). Thereby, mechanism providing reliable storage turns out as one of the most essential components in design of distributed systems. Generally, pure replication and erasure code are introduced to guarantee highlyavailable storage, and they obtain widespread adoption in distributed systems. Google File System (Ghemawat, Gobioff, & Shun-Tak, 2003), Pastry (Rowstron & Druschel, 2007), and Tapestry (Zhao, Kubiatowicz, & Joseph, 2001) are typical replicated systems, in which data is mirroring not only to tolerate failures, but also to support concurrent data access in consideration of latency reduction. However, pure replication consumes too much extra storage and bandwidth, hence boosting the cost for deploying additional devices and managements. Erasure code has then been introduced into storage systems to increase storage durability (also known as the expected mean time to failure, or MTTF, of losing any data unit) and to reduce the investment of preparing massive storage as well as the extra transferring and maintaining overhead for the replicas. The key idea behind erasure code is that m blocks of source data are encoded to produce n blocks of encoded data, in such a way that any subset of m encoded blocks suffices to reconstruct the source data. Such a code is called an (n, m) erasure code and allows up to ( ) n m losses in a group of n encoded blocks (Rizzo, 1997). Considering a data sized x , 3-way replication consumes 3×x storage space as well as transferring bandwidth, while (7, 4) erasure code only takes up nearly 7/4 ξ, to suffer up to 3-disk failure. Chen, Edler, Goldberg, Gottlieb, Sobti, and Yianilos (1999) initially implemented erasure code in the distributed context to build a high-efficient archival intermemory. Aguilera and Janakiraman (2005) presented methods of efficiently using erasure code in distributed systems. Weatherspoon and Kubiatowicz (2002) pointed out that systems employing erasure code have MTTF many orders of magnitude higher than replicated systems with similar storage and bandwidth requirements, and reversely, erasure-resilient systems take up much less bandwidth and storage to provide similar system durability as replicated systems. However, besides an extra computation for coding procedure, erasure coding mechanism disappoints system designers for a longer access delay as multiple servers containing the erasure stripes have to be contacted to read a single block and it encounters bottlenecks when either modifications or failures happen frequently, which may introduce excessive coding processes as to possibly drain the system. From the aforementioned discuss, we notice that on one hand, pure replicated systems benefit from high access performance yet consume intolerable extra resources, on the other hand, pure erasure-resilient systems win more durable storage yet may not be so high-efficiency in reducing access delay and only suitable for read-only contexts. A nature mind is to combine them in a single system. Kubiatowicz et al. (2000) introduced the hybrid method coordinated with version mechanism in OceanStore. They classified objects into active form representing the latest version of its data, which is replicated for performance consideration, and archival form indicating a permanent, read-only version of the object, which is thereby, encoded with erasure code to achieve more durable storage (Kubiatowicz et al., 2000). However, version itself brings in overhead and for some large storage systems, data in older version is not often useful, which as a matter of fact, leaves OceanStore a pure mirroring system. We propose REPERA to truly combine replication and erasure code as an objective to leverage their advantages and mitigate their deficiencies. The remainder of this paper is organized as follows. Section 2 presents the main idea of designing REPERA. A qualitative analysis of REPERA will be specifically 10 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/analysis-repera-hybrid-dataprotection/64636?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Computer Science, Security, and Information Technology. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Distributed Generation Planning in Radial Distribution Networks Considering Protection Coordination Limits

Distributed generation (DG) has been widely used in distribution network to reduce the energy losses, improve voltage profile and system reliability, etc.  The location and capacity of DG units can influence on probability of protection mal-operation in distribution networks. In this paper, a novel model for DG planning is proposed to find the optimum DG location and sizing in radial distr...

متن کامل

A Study on the Effective Cultural Factors Involving in Protection of the Environment in Tehran

In the world of today, the issue of environmental protection is one of the main concerns. This research is created with the purpose of scientifically identification of cultural effects on environmental protection and behaviors. The views analysis is based on Durkheim, Weber, Parsons, Giddens, Tomeh approaches and theories. The research and questionnaire are used for 400 citizens of Tehran sele...

متن کامل

Protection of Environment and women’s social position in Iran (A Community Level Study)

The article addresses the mutual relationship between environmental improvements and improvement in social status of women. The main question of this article is: Are there any relationships between improvement in environment protection and promotion in the situation of women? The theoretical framework includes Hatfield’s view about the existence of communities between the public and private sph...

متن کامل

A Hybrid Method of DEA and MODM in Grey Environment

DEA Classic models cannot be used for inaccurate and indeterminate data, and it is supposed that the data for all inputs and outputs are accurate and determinate. However, in real life situations uncertainty is more common. This article attempts to get the common weights for Decision-Making Units by developing DEA multi-objective models in the grey environment. First, we compute the privilege o...

متن کامل

Designing a new multi-objective fuzzy stochastic DEA model in a dynamic ‎environment to estimate efficiency of decision making units (Case Study: An Iranian Petroleum Company)

This ‎paper presents a new multi-objective fuzzy stochastic data envelopment analysis model          (MOFS-DEA) under mean chance constraints and common weights to estimate the efficiency of decision making units for future financial periods of them. In the initial MOFS-DEA ‏model, the outputs and inputs are ‎characterized by random triangular fuzzy variables with normal distribution, in which ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCAC

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2012